The CIST Summarization System at TAC 2011

نویسندگان

  • Hongyan Liu
  • Ping'an Liu
  • Wei Heng
  • Lei Li
چکیده

In this report, we present our extractive summarization system on both summarization and multiling tracks of TAC 2011. We introduce an extractive multi-document summarization method based on hierarchical topic model of hierarchical Latent Dirichlet Allocation (hLDA) and sentence compression. hLDA is a representative generative probabilistic model, which not only can mine latent topics from a large amount of discrete text data, but also can organize these topics into a hierarchy to achieve a deeper semantic analysis. We try to combine the hLDA model with some traditional features. The evaluation results showed some improvement compared with our own system in TAC 2010, which is based on sentence clustering. But there are still many problems needed to be studied in the future. As to the new multiling task of TAC 2011, we used the frame of the hLDA model again but deleting those knowledge base for English. We tried all the 7 languages, including Arabic, Czech, English, French, Hebrew, Hindi and Greek. The evaluations of human confirmed that our method has better performance than some other ones.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The CIST Summarization System at TAC 2010

This is the first time we participate in TAC. In this report, we present our extractive summarization system on both initial and update summarization tracks of TAC 2010. We introduce an integrated method to generate all summaries. The TAC evaluation of results show that our summarization method is feasible but it has to be improved in future.

متن کامل

CIST System for CL-SciSumm 2016 Shared Task

This paper introduces the methods and experiments applied in CIST system participating in the CLSciSumm 2016 Shared Task at BIRNDL 2016. We have participated in the TAC 2014 Biomedical Summarization Track, so we develop the system based on previous work. This time the domain is Computational Linguistics (CL). The training corpus contains 20 topics from Training-Set-2016 and Development-Set-Apr8...

متن کامل

WHUSUM Participation at TAC 2011 Guided Summarization Track

In this report, we present details about the participation of WHUSUM in the guided summarization track at TAC 2011. Guided summarization task requires participants to produce short, coherent summaries of news articles with the guidance of predefined categories and aspects for each category. This year, we extended our query-focused update summarization system with aspect related information. In ...

متن کامل

Using SUMMA for Language Independent Summarization at TAC 2011

The paper describes a language independent multi-document centroid-based summarization system. The system has been evaluated in the 2011 TAC Multilingual Summarization pilot task where summaries were automatically produced for document clusters in Arabic, English, French and Hindi. The system had a reasonable performance in content selection for languages such as Arabic and Hindi and medium per...

متن کامل

UofL at TAC 2011 Guided Summarization Task

In this paper, we describe our guided summarization system that participated in the TAC 2011 competition. We submitted two runs for the guided summarization task by following a random walk paradigm. Two different approaches were applied for the update component to create two runs of our guided summarization system: 1) using ROUGE (RecallOriented Understudy for Gisting Evaluation), and 2) using ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011